Weighted Experts: A Solution for the Spock Data Mining Challenge
نویسندگان
چکیده
One of the most popular and trend-setting Internet applications is People Search on the World Wide Web. In its most general form, information extraction for persons from unstructured data is extremely challenging, and, we are pretty far away from satisfying solutions. However, current retrieval technology is able to cope with restricted variants of the problem, and this paper deals with such a variant, the so-called multi document person resolution. Given is a set of Web documents, and the task is to state for each document pair whether the two documents are talking about the same person or not. For this problem Spock Inc., Silicon Valley, launched in 2007 a competition offering a grand prize of $50 000. Task was the person-specific classification of 100 000 Web pages within 4 hours on a standard PC, striving for a maximum F -Measure. The paper in hand describes the challenge and introduces the technology of the winning team from the Bauhaus University Weimar [see 1].
منابع مشابه
تحلیل پتنت بااستفاده از داده کاوی برای شناسایی و تعیین ارتباطات میان فناوری ها
Analyzing technologies relationships can provide insight into technological strategies and maximizing the profits. Patents are integral parts of intellectual property rights including significant information about developed technologies. Due to the increasing amount of patents, data mining method is proposed for patent analysis. Thus, weighted association rules have been used for patent analysi...
متن کاملFuzzy Weighted Associative Classifier: a Predictive Technique for Health Care Data Mining
In this paper we extend the problem of classification using Fuzzy Association Rule Mining and propose the concept of Fuzzy Weighted Associative Classifier (FWAC). Classification based on Association rules is considered to be effective and advantageous in many cases. Associative classifiers are especially fit to applications where the model may assist the domain experts in their decisions. Weigh...
متن کاملOverlap-based feature weighting: The feature extraction of Hyperspectral remote sensing imagery
Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weigh...
متن کاملA Survey of Frequent and Infrequent Weighted Itemset Mining Approaches
Itemset mining is a data mining method extensively used for learning important correlations among data. Initially itemsets mining was made on discovering frequent itemsets. Frequent weighted item set characterizes data in which items may weight differently through frequent correlations in data’s. But, in some situations, for instance certain cost functions need to be minimized for determining r...
متن کاملNot-So-Linked Solution to the Linked Data Mining Challenge 2016
We present a solution for the Linked Data Mining Challenge 2016, that achieved 92.5% accuracy according to the submission system. The solution uses a hand-crafted dataset, that was created by scraping various websites for reviews. We use logistic regression to learn a classification model and we publish all our results to GitHub.
متن کامل